10 research outputs found

    Hurricane Forecasting: A Novel Multimodal Machine Learning Framework

    Full text link
    This paper describes a machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple distinct ML techniques and utilizing diverse data sources. Our framework, which we refer to as Hurricast (HURR), is built upon the combination of distinct data processing techniques using gradient-boosted trees and novel encoder-decoder architectures, including CNN, GRU and Transformers components. We propose a deep-feature extractor methodology to mix spatial-temporal data with statistical data efficiently. Our multimodal framework unleashes the potential of making forecasts based on a wide range of data sources, including historical storm data, and visual data such as reanalysis atmospheric images. We evaluate our models with current operational forecasts in North Atlantic and Eastern Pacific basins on 2016-2019 for 24-hour lead time, and show our models consistently outperform statistical-dynamical models and compete with the best dynamical models, while computing forecasts in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model leads to a significant improvement of 5% - 15% over NHC's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that combining different data sources and distinct machine learning methodologies can lead to superior tropical cyclone forecasting. We hope that this work opens the door for further use of machine learning in meteorological forecasting.Comment: Under revision by the AMS' Weather and Forecasting journa

    InsectUp: Crowdsourcing Insect Observations to Assess Demographic Shifts and Improve Classification

    Full text link
    Insects play such a crucial role in ecosystems that a shift in demography of just a few species can have devastating consequences at environmental, social and economic levels. Despite this, evaluation of insect demography is strongly limited by the difficulty of collecting census data at sufficient scale. We propose a method to gather and leverage observations from bystanders, hikers, and entomology enthusiasts in order to provide researchers with data that could significantly help anticipate and identify environmental threats. Finally, we show that there is indeed interest on both sides for such collaboration.Comment: Appearing at the International Conference on Machine Learning, AI for Social Good Workshop, Long Beach, United States, 2019 Appearing at the International Conference on Computer Vision, AI for Wildlife Conservation Workshop, Seoul, South Korea, 2019 5 pages, 6 figure

    Holistic Deep Learning

    Full text link
    There is much interest in deep learning to solve challenges in applying neural network models in real-world environments. In particular, three areas have received considerable attention: adversarial robustness, parameter sparsity, and output stability. Despite numerous attempts to solve these problems independently, little work simultaneously addresses the challenges. In this paper, we address the problem of constructing holistic deep learning models by proposing a novel formulation that solves these issues in combination. Real-world experiments on both tabular and MNIST datasets show that our formulation can simultaneously improve the accuracy, robustness, stability, and sparsity over traditional deep learning models among many others.Comment: In preparation for Machine Learnin

    TabText: A Flexible and Contextual Approach to Tabular Data Representation

    Full text link
    Tabular data is essential for applying machine learning tasks across various industries. However, traditional data processing methods do not fully utilize all the information available in the tables, ignoring important contextual information such as column header descriptions. In addition, pre-processing data into a tabular format can remain a labor-intensive bottleneck in model development. This work introduces TabText, a processing and feature extraction framework that extracts contextual information from tabular data structures. TabText addresses processing difficulties by converting the content into language and utilizing pre-trained large language models (LLMs). We evaluate our framework on nine healthcare prediction tasks ranging from patient discharge, ICU admission, and mortality. We show that 1) applying our TabText framework enables the generation of high-performing and simple machine learning baseline models with minimal data pre-processing, and 2) augmenting pre-processed tabular data with TabText representations improves the average and worst-case AUC performance of standard machine learning models by as much as 6%

    Over-MAP: Structural Attention Mechanism and Automated Semantic Segmentation Ensembled for Uncertainty Prediction

    No full text
    International audienceBoth theoretical and practical problems in deep learning classification require solutions for assessing uncertainty prediction but current state-of-the-art methods in this area are computationally expensive. In this paper, we propose a new confidence measure dubbed Over-MAP that utilizes a measure of overlap between structural attention mechanisms and segmentation methods, that is of particular interest in accurate fine-grained contexts. We show that this classification confidence increases with the degree of overlap. The associated confidence and identification tools are conceptually simple, efficient, and of high practical interest as they allow for weeding out misleading examples in training data. Our measure is currently deployed in the real-world on widely used platforms to annotate large-scale data efficiently

    Gradient-Based Localization and Spatial Attention for Confidence Measure in Fine-Grained Recognition using Deep Neural Networks

    No full text
    Both theoretical and practical problems in deep learning classification benefit from assessing uncertainty prediction. In addition, current state-of-the-art methods in this area are computationally expensive: for example,~\cite{loquercio2020general} is a general method for uncertainty estimation in deep learning that relies on Monte-Carlo sampling. We propose a new, efficient confidence measure later dubbed Over-MAP that utilizes a measure of overlap between structural attention mechanisms and segmentation methods. It does not rely on sampling or retraining. We show that the classification confidence increases with the degree of overlap. The associated confidence and identification tools are conceptually simple, efficient and of high practical interest as they allow for weeding out misleading examples in training data. Our measure is currently deployed in the real-world on widely used platforms to annotate large-scale data efficiently

    Over-MAP: Structural Attention Mechanism and Automated Semantic Segmentation Ensembled for Uncertainty Prediction

    No full text
    International audienceBoth theoretical and practical problems in deep learning classification require solutions for assessing uncertainty prediction but current state-of-the-art methods in this area are computationally expensive. In this paper, we propose a new confidence measure dubbed Over-MAP that utilizes a measure of overlap between structural attention mechanisms and segmentation methods, that is of particular interest in accurate fine-grained contexts. We show that this classification confidence increases with the degree of overlap. The associated confidence and identification tools are conceptually simple, efficient, and of high practical interest as they allow for weeding out misleading examples in training data. Our measure is currently deployed in the real-world on widely used platforms to annotate large-scale data efficiently

    Geo-Spatiotemporal Features and Shape-Based Prior Knowledge for Fine-grained Imbalanced Data Classification

    Get PDF
    Copyright by the authors. All rights reserved to authors only. Correspondence to: ckantor (at) stanford [dot] eduInternational audienceFine-grained classification aims at distinguishing between items with similar global perception and patterns, but that differ by minute details. Our primary challenges come from both small inter-class variations and large intra-class variations. In this article, we propose to combine several innovations to improve fine-grained classification within the use-case of wildlife, which is of practical interest for experts. We utilize geo-spatiotemporal data to enrich the picture information and further improve the performance. We also investigate state-of-the-art methods for handling the imbalanced data issue
    corecore